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Abstract 

This  note  examines  one  of  the  most  crucial  questions  in  causal  inference:  “How  gen- 
eralizable  are  randomized  clinical  trials?”  The  question  has  received  a  formal  treatment 
recently,  using  a  non-parametric  setting  which  has  led  to  a  simple  and  general  solu¬ 
tion.  I  will  describe  this  solution  and  several  of  its  ramifications,  and  compare  it  to 
the  way  researchers  have  attempted  to  tackle  the  problem  using  the  language  of  ig- 
norability.  We  will  see  that  ignor ability-type  assumptions  need  to  be  enriched  with 
structural  assumptions  in  order  to  capture  the  full  spectrum  of  conditions  that  permit 
generalizations,  and  in  order  to  judge  their  plausibility  in  specific  applications. 


1  Transportability  and  Selection  Bias 

The  classical  problem  of  generalizing  experimental  findings  from  the  trial  sample  to  the 
population  as  a  whole,  also  known  as  the  problem  of  “sample  selection-bias”  (Heckman, 
1979;  Bareinboim  et  al.,  2014),  has  received  renewed  attention  in  the  past  decade,  as  more 
researchers  come  to  recognize  this  bias  as  a  major  threat  to  the  validity  of  experimental 
findings  in  both  the  health  sciences  (Stuart  et  al.,  2015)  and  social  policy  making  (Manski, 
2013). 

Since  participation  in  a  randomized  trial  cannot  be  mandated,  we  cannot  guarantee  that 
the  study  population  would  be  the  same  as  the  population  of  interest.  For  example,  the 
study  population  may  consist  of  volunteers,  who  respond  to  financial  and  medical  incentives 
offered  by  pharmaceutical  firms  or  experimental  teams,  so,  the  distribution  of  outcomes  in 
the  study  may  differ  substantially  from  the  distribution  of  outcomes  under  the  policy  of 
interest. 

Another  impediment  to  the  validity  of  experimental  finding  is  that  the  types  of  individuals 
in  the  target  population  may  change  over  time  (Hotz  et  al.,  2005).  For  example,  as  more 
individuals  become  eligible  for  health  insurance,  the  types  of  individuals  seeking  services 
would  no  longer  match  the  type  of  individuals  that  were  sampled  for  the  study  (Stuart 
et  al.,  2015).  A  similar  change  would  occur  as  more  individuals  become  aware  of  the  efficacy 
of  the  treatment.  The  result  is  an  inherent  disparity  between  the  target  population  and  the 
population  under  study. 
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The  problem  of  generalizing  across  disparate  populations  has  received  a  formal  treatment 
in  (Pearl  and  Bareinboim,  2014)  where  it  was  labeled  “transportability,”  and  where  necessary 
and  sufficient  conditions  for  valid  generalization  were  established  (see  also  Bareinboim  and 
Pearl,  2013).  The  problem  of  selection  bias,  though  it  has  some  unique  features,  can  also 
be  viewed  as  a  nuance  of  the  transportability  problem,  thus  inheriting  all  the  theoretical 
results  established  in  (Pearl  and  Bareinboim,  2014)  that  guarantee  valid  generalizations.  1 
will  describe  the  two  problems  side  by  side  and  then  return  to  the  distinction  between  the 
type  of  assumptions  that  are  needed  for  enabling  generalizations. 

The  transportability  problem  concerns  two  dissimilar  populations,  II  and  II*,  and  requires 
us  to  estimate  the  average  causal  effect  P*(yx )  (explicitly:  P*(yx )  =  P*(Y  =  y\do(X  =  x ))  in 
the  target  population  II*,  based  on  experimental  studies  conducted  on  the  source  population 
II.1  Formally,  we  assume  that  all  differences  between  II  and  II*  can  be  attributed  to  a 
set  of  factors  S  that  produce  disparities  between  the  two,  so  that  P*(yx )  =  P(yx\S  =  1). 
The  information  available  to  us  consists  of  two  parts;  first,  treatment  effects  estimated  from 
experimental  studies  in  II  and,  second,  observational  information  extracted  from  both  II  and 
II*.  The  former  can  be  written  P(y\d,o(x),  z),  where  Z  is  set  of  covariates  measured  in  the 
experimental  study,  and  the  latters  are  written  P*(x,y,z )  =  P(x,y,z\S  =  1),  and  P(x,y,z ) 
respectively.  In  addition  to  this  information,  we  are  also  equipped  with  a  qualitative  causal 
model  M,  that  encodes  causal  relationships  in  II  and  II*,  with  the  help  of  which  we  need  to 
identify  the  query  P*(yx).  Mathematically,  identification  amounts  to  transforming  the  query 
expression 

P*(Vx)  —  P(y\do(x),  S  —  1)  (1) 

into  a  form  derivable  from  the  available  information  Itr,  where 

Itr  =  { P(y\do(x ),  z),P(x,  y,  z),  P(x,  y,  z\S  =  1)}.  (2) 

The  first  two  components  of  Itr  represent,  respectively,  the  experimental  and  obser¬ 
vational  findings  in  14,  while  the  third  component  represents  observational  findings  in  II*. 
Appendix  1  demonstrates  how  the  query  P*(yx )  can  be  derived  from  Itr  using  assumptions 
about  the  disparities  between  II  and  II*  that  are  encoded  in  a  graph. 

The  selection  bias  problem  is  slightly  different.  Here  the  aim  is  to  estimate  the  average 
causal  effect  P(yx)  in  the  n  population,  while  the  experimental  information  available  to  us, 
Isb,  comes  from  a  preferentially  selected  sample,  S  =  1,  and  is  given  by  P(y\do(x),  z,  S  =  1). 
In  addition,  we  also  assume  to  have  access  to  observational  information  P(x,  y,z\S  =  1)  and 
P(x,y,z );  the  first  represents  observations  obtained  from  the  selected  sample,  S  =  1,  and 
the  second  represents  observation  taken  on  the  population  at  large.  Thus,  the  selection  bias 
problem  calls  for  transforming  the  query  P(yx )  to  a  form  derivable  from  the  information  set: 

Isb  =  { P(y\do(x ),  z,S  =  1  ),P(x,  y,z\S  =  l),  P(x,  y,  z)}.  (3) 

In  the  Appendix  section,  we  demonstrate  how  transportability  problems  and  selection 
bias  problems  are  solved  using  the  transformations  described  above.  At  this  point,  however, 

1We  focus  our  discussion  on  the  average  causal  effect  (ATE),  yet  identical  considerations  apply  to  other 
causal  parameters,  such  as  the  effect  of  treatment  on  the  treated  (ETT).  On  the  connection  between  ATE 
and  ETT,  see  (Shpitser  and  Pearl,  2009). 
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it  is  important  to  note  the  syntactic  differences  between  the  information  sets  available  in  the 
two  problems.  Itr  is  characterized  by  the  fact  that  S  does  not  appear  in  the  conditioning  part 
of  any  do-expression,  thus  reflecting  the  fact  that  we  do  not  have  experimental  information 
from  the  target  population  II*.  /jg  on  the  other  hand  is  characterized  by  the  fact  that 
do-expressions  are  always  conditioned  on  S ,  reflecting  the  fact  that  we  have  experimental 
information  only  on  the  selected  sample,  S  —  1. 

The  analysis  reported  in  (Pearl  and  Bareinboim,  2014)  has  resulted  in  an  algorithmic 
criterion  for  deciding  whether  transportability  is  feasible  and,  when  confirmed,  the  algorithm 
produces  an  estimand  for  the  desired  effects  (Bareinboim  and  Pearl,  2013).  The  algorithm 
is  complete,  in  the  sense  that,  when  it  fails,  a  consistent  estimate  of  the  target  effect  does 
not  exist  (unless  one  strengthens  the  assumptions  encoded  in  M). 

There  are  several  lessons  to  be  learned  from  this  analysis  when  considering  generalizing 
experimental  findings. 

1.  The  graphical  criteria  that  authorize  transportability  are  applicable  to  selection  bias 
problems  as  well,  provided  that  the  graph  structures  for  the  two  problems  are  identical. 
This  means  that  whenever  a  selection  bias  problem  is  characterized  by  a  graph  for 
which  transportability  is  feasible,  recovery  from  selection  bias  is  feasible  by  the  same 
algorithm.  (The  Appendix  demonstrates  this  correspondence.) 

2.  The  assumptions  needed  for  transportability  are  more  involved  than  the  ones  usually 
invoked  for  ensuring  non-confoundedness,  also  called  “treatment  assignment  ignorabil- 
ity.”  In  graphical  terms,  these  assumptions  may  require  several  d-separation  tests  on 
several  sub-graphs.  It  is  utterly  unimaginable  therefore  that  such  assumptions  could 
be  managed  by  unaided  human  judgment,  as  is  normally  assumed  in  the  potential 
outcomes  literature  (Hartman  et  ah,  2015;  Stuart  et  ah,  2015). 

3.  In  general,  problems  associated  with  generalizing  across  populations  cannot  be  handled 
by  balancing  disparities  between  distributions.  A  given  disparity  between  P(x,y,z ) 
and  P*(x,y,z)  may  demand  different  adjustments,  depending  on  the  location  of  S 
in  the  causal  structure.  A  simple  example  of  this  phenomenon  is  demonstrated  in 
Fig.  3(b)  of  (Pearl  and  Bareinboim,  2014)  where  a  disparity  in  the  average  reading 
ability  of  two  cities  requires  two  different  treatments,  depending  on  what  causes  the 
disparity.  If  the  disparity  emanates  from  age  differences,  adjustment  is  necessary, 
because  age  is  likely  to  affect  the  potential  outcomes.  If,  on  the  other  hand  the  disparity 
emanates  from  differences  in  educational  programs,  no  adjustment  is  needed,  since 
education,  in  itself,  does  not  modify  response  to  treatment.  Such  distinctions,  which 
may  become  quite  intricate  in  large  systems,  are  managed  automatically  in  the  graph- 
based  representation. 

4.  In  many  instances,  generalizations  can  only  be  achieved  by  conditioning  on  post¬ 
treatment  variables,  an  operation  that  is  generally  frowned  upon  in  the  potential  out¬ 
comes  framework  (Rosenbaum,  2002,  pp.  73-74;  Rubin,  2004;  Sekhon,  2009)  but  has 
become  extremely  useful  in  graphical  analysis.  The  difference  between  the  condition¬ 
ing  operators  used  in  these  two  frameworks  is  reflected  in  the  difference  between  the 
counterfactual  expression  P{YX  =  y\z)  and  the  do-expression  P(Y  =  y\do(X  =  x),z). 
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(Pearl,  2015).  The  latter  expression  defines  information  that  is  estimable  directly  from 
experimental  studies,  whereas  the  former  invokes  retrospective  counterfactuals  that 
may  or  may  not  be  estimable  empirically. 

In  the  next  Section  we  will  discuss  the  differences  between  these  two  conditioning  opera¬ 
tors  and  the  benefit  of  leveraging  post-treatment  variables  in  problems  concerning  general¬ 
ization. 


2  Ignorability  versus  Admissibility  in  the  Pursuit  of 
Generalizations 

A  key  assumption  in  almost  all  conventional  analyses  of  generalization  (from  sample-to- 
population)  is  5-ignorability,  written 


Yx  X  S\Z  (4) 

where  Yx  is  the  potential  outcome  predicated  on  the  intervention  X  =  x,  5  is  a  selection 
indicator  (with  5  =  1  standing  for  selection  into  the  sample)  and  Z  a  set  of  observed 
covariates.  This  assumption,  commonly  written  as  a  difference  Yi  —  Y0  _LL  S\Z,  appears  in 
Hotz  et  al.  (2005);  Cole  and  Stuart  (2010);  Tipton  et  al.  (2014);  Hartman  et  al.  (2015),  and 
possibly  other  researchers  confined  to  potential  outcomes  analysis.  This  assumption  states 
that  in  every  stratum  Z  =  z  of  the  set  Z,  the  potential  outcome  Yx  is  independent  of  the 
factors  5  that  may  produce  cross-population  differences. 

Given  this  assumption,  the  problem  of  generalizing  across  populations  has  a  trivial  so¬ 
lution,  which  reads:  If  we  succeed  in  finding  a  set  Z  of  pre-treatment  covariates  such  that 
cross-population  differences  disappear  in  every  stratum  Z  =  z,  then  the  problem  can  be 
solved  by  averaging  over  those  strata.2 

Specifically,  if  P(yx\S  —  1,  Z  —  z)  is  the  ^-specific  probability  distribution  of  Yx  in  the 
sample,  then  the  distribution  of  Yx  in  the  population  at  large  is  given  by  the  post- stratification 
formula 

P{yx)  =  Y,P{Vx\S  =  l,z)P{z)  (5) 

z 

which  is  often  referred  to  as  re- calibration  or  re-weighting.  Here,  P(z)  is  the  probability  of 
Z  =  z  in  the  target  population  (where  5  =  0).  Equation  (5)  follows  from  5-ignorability  by 
conditioning  on  z  and,  adding  5  =  1  to  the  conditioning  set  -  a  one-line  proof.  The  proof 
fails  however  when  no  covariate  set  Z  exists  that  satisfies  5-ignorability,  in  which  case  the 
post-stratification  formula  will  be  invalid.  Moreover,  even  when  5-ignorability  holds,  Eq.  (5) 
would  only  be  applicable  if  the  factor  P(yx\S  =  1  ,z)  is  estimable  in  the  experimental  study 
and  this  will  generally  not  be  the  case  when  Z  contains  post-treatment  variables  (see  Pearl 
2015,  Fig.  1). 

Symmetrically,  when  we  consider  transportability  problems,  our  query  is  P*(yx) 
=  P(y\do(x),  S  =  1)  (see  Eq.  (1)),  and  5-ignorability  would  permit  us  to  remove  the  5  =  1 

2Lacking  a  procedure  for  finding  Z,  this  solution  addresses  only  part  of  the  problem,  leaving  the  choice 
of  Z  to  unaided  intuitive  judgement. 
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condition  and  obtain  the  post-stratification  formula 

P'(y.)  =  P(vAS  =  l)  =  J2  P(v*\z)P&\S  =  i)  (6) 

z 

Similar  to  Eq.  (5),  this  formula  takes  a  weighted  average  of  the  ^-specific  potential  out¬ 
come  Yx  over  all  levels  of  Z.  Here,  in  syntactic  contrast,  the  average  is  weighed  by  P(z\S  =  1) 
which  is,  again,  the  distribution  of  Z  in  the  target  population  (where  S  —  1).  As  in  the 
case  of  selection  bias,  Eq.  (6)  is  only  useful  when  S-ignorability  holds  and  when  P(yx\z)  is 
estimable  from  the  experimental  data.  Unfortunately,  when  Z  contains  post-treatment  vari¬ 
ables,  the  former  condition  will  be  harder  to  meet;  we  shall  see  that  S'-ignorability  is  rarely 
satisfied  in  transportability  problems  by  any  set  Z  containing  post-treatment  variables. 

In  graphical  analysis,  on  the  other  hand,  the  problem  of  generalization  has  been  studied 
using  another  assumption,  labeled  S-admissibility  (Pearl  and  Bareinboim,  2014),  which  is 
defined  by: 


P(y\do(x),z)  =  P(y\do(x),  z,  s) 
or,  using  counterfactual  notation, 


(7) 


P(jjx \%x)  P(jjx  |  ^X1 

It  states  that  in  every  treatment  regime  A"  =  x ,  the  observed  outcome  Y  is  conditionally 
independent  of  the  selection  mechanism  S,  given  Z,  all  evaluated  at  that  same  treatment 
regime. 

Clearly,  S-admissibility  coincides  with  S-ignorability  for  pretreatment  S  and  Z\  the  two 
notions  differ  however  for  treatment-dependent  selection  and  covariates.  To  witness,  consider 
the  model  of  Fig.  1(a),  and  let  X  stand  for  education,  Z  for  skill,  S  for  training,  and  Y  for 
salary.  S-admissibility  (4)  looks  at  those  people  who  were  assigned  x  years  of  education  who 


(Training) 


l  l  1 

(Education)  (Skill)  (Salary) 


(a) 


Z 

(Education)  (Skill)  \^(Salary) 

S  (Test) 
(b) 


Figure  1:  (a)  A  transportability  model  in  which  a  post-treatment  variable  Z  is  S-admissible 
but  not  S-ignorable;  (b)  A  selection-bias  model  in  which  Z  is  both  S-admissible  and  S- 
ignorable.  Note  that  S  is  a  root  node  in  (a)  and  a  sink  node  in  (b),  where  it  is  a  proxy  of 
Z.  In  both  models,  the  post-stratification  formula  (5)  is  not  estimable  non-parametrically. 

subsequently  achieved  skill  level  z,  and  asks  whether  their  salary  Y  would  depend  on  their 
training  S.  The  graph  states  that  skill  alone  determines  salary,  not  how  it  was  acquired, 
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therefore  P(y\do(x),z))  =  P(y\do(x),  z,s)  =  P(y\z )  namely,  training  and  education  have  no 
effect  on  salary,  once  we  know  z,  as  shown  in  the  graph. 

In  contrast,  5-ignorability  Yx  _LL  S\Z  asks  for  the  role  that  training  plays  in  the  salary 
of  those  individuals  who  are  currently  at  skill  Z  =  z,  had  they  received  x  years  of  schooling. 
Surely,  unless  x  is  pathologically  low,  the  skill  levels  attained  by  these  individuals  would 
depend  on  the  amount  of  training  (5)  they  receive,  and  so  would  their  salary  Y.  We  thus 
conclude  that  Yx  is  not  independent  of  S  given  Z,  namely,  5-ignorability  does  not  hold. 
The  condition  Z  =  z  merely  selects  a  subpopulation  for  consideration  but,  unless  individu¬ 
als  in  this  subpopulation  possess  some  abnormal  qualities,  they  should  exhibit  the  natural 
dependence  of  salary  on  training.3 

The  Appendix  section  shows  that  unbiased  generalization  across  studies  is  indeed  feasible 
in  scenarios  like  Fig.  1  (a),  despite  the  fact  that  Z  is  not  5-ignorable.  This  is  facilitated  by 
the  fact  that  Z  is  5-admissiblc,  since  Z  separates  Y  from  S  in  the  graph,  and  leads  to  the 
following  estimand  for  the  target  effect: 

P(yx\S  =  1)  =  ^  p(y\do(x)i  z)P(z\x,  5  =  1). 

z 

Note  that  this  estimand  invokes  nonconventional  average  of  the  ^-specific  effect,  weighted 
by  the  conditional  probability  P(z\x)  at  the  target  population. 

A  similar  situation  occurs  in  sample-selection  problems  such  as  the  one  depicted  in  Fig. 
1(b),  where  generalization  from  samples  to  populations  through  the  post-stratification  for¬ 
mula  (5)  requires  5-ignorability.  Here,  the  post-stratification  formula  (5)  is  valid  because  Z 
is  5-ignorable  (Z  separates  5  from  Yx  in  the  graph),  yet  the  formula  is  useles,  because  the 
z-specific  causal  effect  P(yx\S  =  1  ,z)  is  not  estimable  from  the  experimental  study. 

Remarkably,  the  target  distribution  P(yx )  can  be  estimated  using  a  modified  formula: 

p(yx)  =  p(y\do(x)i  z,s  =  i)p{z\x) 

z 

which  follows  from  the  fact  that  Z  is  5- admissible.  The  derivation  is  presented  in  Scenario 
3  of  the  Appendix  and  demonstrates  that,  regardless  of  whether  Z  satisfies  5-ignorability 
or  5-admissibility,  experimental  findings  are  not  generalizable  by  standard  procedures  of 
post-stratification.  Rather,  modified  procedures  need  be  applied,  dictated  by  the  graph 
structure. 

One  of  the  reasons  that  5-admissibility  has  received  greater  attention  in  the  graph-based 
literature  is  that  it  has  a  very  simple  graphical  representation:  Z  and  A"  should  separate  Y 
from  5  in  a  mutilated  graph,  from  which  all  arrows  entering  A"  have  been  removed.  Such  a 
graph  depicts  conditional  independencies  among  observed  variables  in  the  population  under 
experimental  conditions,  i.e.,  where  X  is  randomized. 

3To  show  explicitly  that  S'-ignorability  does  not  hold  in  Fig.  1(a),  one  can  examine  a  linear  model  and 
use  Eq.  (11.28)  of  (Pearl,  2009,  p.  389)  to  show  that 

E[YX\Z  =  z,  S  =  s]  =  ax  +  bz  +  cs 

with  non-zero  c. 
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S-ignorability  requires  a  more  elaborate  graphical  interpretation;  it  can  be  verified  from 
either  twin  networks  (Pearl,  2009,  pp.  213-4)  or  from  counterfactually  augmented  graphs 
(Pearl,  2009,  p.  341).  Using  either  representation,  it  is  easy  to  see  that  S-ignorability  is  rarely 
satisfied  in  problems  in  which  Z  is  a  post-treatment  variable.  This  is  because,  whenever  S 
is  an  ancestor  of  Z,  or  a  proxy  of  such  ancestor,  Z  cannot  separate  Yx  from  S. 

As  noted  in  (Keiding,  1987)  the  re-calibration  formula  (5)  goes  back  to  18th  century  de¬ 
mographers  (Dale,  1777;  Tetens,  1786)  facing  the  task  of  predicting  overall  mortality  (across 
populations)  from  age-specific  data.  Their  reasoning  was  probably  as  follows:  If  the  source 
and  target  populations  differ  in  distribution  by  a  set  of  attributes  Z,  then  to  correct  for 
these  differences  we  need  to  weight  samples  by  a  factor  that  would  restore  similarity  to 
the  two  distributions.  Some  researchers  view  Eq.  (5)  as  a  version  of  Horvitz  and  Thomp¬ 
son  (1952)  post-stratification  method  of  estimating  the  mean  of  a  super-population  from 
un-representative  stratified  samples.  The  essential  difference  between  survey  sampling  cali¬ 
bration  and  the  calibration  required  in  Eq.  (5)  is  that  the  calibrating  covariates  Z  are  not 
just  any  set  by  which  the  distributions  differ;  they  must  satisfy  the  S-ignorability  (or  ad¬ 
missibility)  condition,  which  is  a  causal,  not  a  statistical  condition  and  is  not  discernible 
therefore  from  distributions  over  observed  variables.  In  other  words,  the  re-calibration  for¬ 
mula  should  depend  on  disparities  between  the  causal  models  of  the  two  populations,  not 
merely  on  distributional  disparities;  we  discussed  this  point  in  Section  1  (item  3)  and  it  is 
also  demonstrated  in  the  Appendix  (Fig.  2(a)). 

While  S'-ignorability  and  S-admissibility  are  both  sufficient  for  re-calibrating  pre-treatment 
covariates  Z,  S-admissibility  goes  further  and  discovers  generalizations  that  leverage  both 
pre-treatment  and  post-treatment  variables.  The  three  examples  discussed  in  the  Appendix 
demonstrate  this  point. 


Conclusions 

1.  Many  opportunities  for  generalization  are  opened  up  through  the  use  of  post-treatment 
variables.  These  opportunities  remain  inaccessible  to  ignorability-based  analysis,  partly 
because  S'-ignorability  does  not  always  hold  for  such  variables  but,  mainly,  because  ig- 
norability  analysis  requires  information  in  the  form  of  ^-specific  counterfactuals,  which 
is  often  not  estimable  from  experimental  studies. 

2.  Most  of  these  opportunities  have  been  chartered  through  the  completeness  results  for 
transportability  (Bareinboim  et  ah,  2014),  others  can  be  revealed  by  simple  derivations 
in  do-calculus  as  shown  in  the  Appendix. 

3.  There  is  still  the  issue  of  assisting  researchers  in  judging  whether  S'-ignorability  (or 
S-admissibility)  is  plausible  in  any  given  application.  Graphs  excel  in  this  dimension 
because  they  match  the  format  in  which  people  store  scientific  knowledge.  Researchers 
who  insist  on  discerning  S-ignorability  by  appealing  to  human  intuition  do  so  at  the 
peril  of  missing  opportunities  for  generalization,  or  producing  biased  effect  estimates. 
Readers  can  appreciate  the  magnitude  of  these  perils  by  examining  the  simple  examples 
presented  in  Fig.  2  of  the  Appendix;  discerning  S'-ignorability  in  any  one  of  the  three 
scenarios  is  a  formidable  judgmental  task  if  unaided  by  graphs. 
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Appendix 

To  each  of  the  models  represented  in  Fig.  2  we  will  provide  a  scenario,  a  problem  specification 
and  a  derivation  of  the  target  estimand. 


Figure  2:  (a)  Generalizable  transportability  problem  in  which  Z  is  S'- admissible  but  S- 
ignorability  does  not  hold,  (b)  Generalizable  selection-bias  problem  in  which  Z  is  S- 
admissible  but  S-ignorability  does  not  hold,  (c)  Generalizable  selection-bias  problem  in 
which  S-admissibility  and  S-ignorability  both  hold,  yet  post-stratification  (Eq.  (5))  fails  to 
estimate  the  target  treatment  effect  P(yx)- 


Scenario  1  (Figure  2(a)): 

X  =  Treatment,  Y  =  outcome,  Z  =  a  bio-marker  believed  to  mediate  between  treatment 
and  outcome.  S  =  a  factor  (say  diet)  that  makes  the  effect  of  X  on  Z  different  in  the  two 
■ populations ,  fl  and  IF .  The  curved  dashed  arch  between  X  and  Y  represents  the  presence  of 
unobserved  confounders. 

Problem  formulation : 

Needed: 

P*{yx)  =  P{y\do(x),  S  —  1) 

Information  set  available: 

Itr  =  {P(y\do(x),  z),P(x,  y,  z\S  =  1),  P(x ,  y,  z)}. 

Assumptions:  S-admissibility  (deduced  from  Fig.  2(a)) 


P(y\do(x),z)  =  P(y\do(x),z,s ) 


Derivation: 


P*(yx)  =  P(y\do(x),  S  —  1) 

=  P(y\do(x),  S  =  1,  z)P(z\do(x),  5  =  1) 

Z 

=  22  P(y\do(x)i  z)P(z\do(x),  S  =  1) 

Z 

=  22  P(y\do(x ),  z)p(z\x,  5  =  1) 

z 


Each  step  in  this  derivation  follows  from  probability  theory  and  the  assumption  of  5- 
admissibility  which  permits  us  to  remove  the  factor  5=1  from  the  first  factor  of  the  second 
line.  The  result  is  an  estimand  in  which  the  condition  5=1  does  not  appear  in  any 
do-expression,  hence  it  is  estimable  from  Itr- 

Scenario  2  (Figure  2(b)) 

This  is  a  selection-bias  version  of  the  transportability  problem  presented  in  Scenario  1.  As¬ 
sume  variable  L  stands  for  “location”  and  that  selection  for  the  study  prefers  subjects  from 
one  location  over  another  (Hotz  et  al,  2005).  The  task  is  to  estimate  the  average  causal 
effect  over  the  entire  population. 

Problem  formulation : 

Needed: 

P(yx)  =  P{y\do(x )) 

Information  set  available: 

Isb  =  {P(y\do(x),  z,S  —  1  ),P(x,  y,z\S=l),  P(x ,  y,  z)}. 

Assumptions:  5-admissibility  (deduced  from  the  model  of  Fig.  2(b)) 

P(y\do(x),z)  =  P(y\do(x),z,s ) 

Derivation: 


P{Vx)  =  P(y\do(x )) 


=  ^2,P(y\do(x),z)P(z\do(x)) 

Z 

=  ^2  p(y\do(x )»  z,S  =  l)P(z\do(x)) 

z 

=  22  p(y\do(x)i  Z’S  =  l)p(z\x) 


The  first  term  in  the  sum  is  estimable  from  the  biased  experimental  study  while  the 
second  from  the  target  population. 
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Scenario  3  (Figure  2(c)) 

This  is  another  selection-bias  version  of  the  problem  presented  in  Scenario  1.  Assume  Z 
represents  a  post- treatment  complication  and,  naturally,  people  with  complications  are  more 
likely  to  enter  the  database. 

Problem  formulation-. 

The  problem  is  identical  to  that  of  Scenario  2  with  the  exception  that  now  both  S- 
admissibility  and  S-ignorability  hold  for  variable  Z .  The  former  can  be  seen  from  its  graph¬ 
ical  definition,  since  Z  and  X  separate  Y  from  S,  and  the  latter  by  noting  the  Z  separate  S 
from  all  exogenous  factors  that  affect  Y. 

Derivation: 

The  same  as  in  Scenario  2.  Again,  we  see  that  the  final  estimand  calls  for  averaging  the 
^-specific  effect  in  the  experiment  over  all  strata  of  Z ,  but  now  the  average  is  weighted  by 
the  conditional  probability  P(z\x)  instead  of  the  marginal  P(z)  that  appears  in  Eq.  (5). 

Remark  1  Note  that,  in  Scenario  2,  if  variable  L  is  observable,  then  the  selection  bias 
problem  can  be  solved  by  re- calibration  over  L,  since  L  is  treatment-independent  and  satisfies 
S -ignorability  (and  S -admissibility).  It  is  only  when  L  is  unobserved  that  we  must  resort  to 
Z ,  a  post  treatment  variable  that  does  not  satisfy  S -ignorability. 
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